Flattening network data for causal discovery: What could wrong?

نویسندگان

  • Marc Maier
  • Katerina Marazopoulou
  • David Arbour
  • David Jensen
چکیده

Methods for learning causal dependencies from observational data have been the focus of decades of work in social science, statistics, machine learning, and philosophy [9, 10, 11]. Much of the theoretical and practical work on causal discovery has focused on propositional representations. Propositional models effectively represent individual directed causal dependencies (e.g., path analysis, Bayesian networks) or conditional distributions of some outcome variable (e.g., linear regression, decision trees). However, propositional representations are limited to modeling independent and identically distributed (IID) data of a single entity type. Many real-world systems involve heterogeneous, interacting entities with probabilistic dependencies that cross the boundaries of those entities (i.e., non-IID data with multiple entity types and relationships). These systems produce network, or relational, data, and they are of paramount interest to researchers and practitioners across a wide range of disciplines. To model such data, researchers in statistics and computer science have devised more expressive classes of directed graphical models, such as probabilistic relational models (PRMs) [2] and directed acyclic probabilistic entityrelationship (DAPER) models [4]. Despite the assumptions embedded in propositional models, a common practice is to flatten, or propositionalize, relational data and use existing algorithms [5] (see Figure 1, focusing on algorithms that learn causal graphical models). While there are statistical concerns, this process is generally innocuous if the task is to model statistical associations for predictive inference. In contrast, to learn causal structure, estimate causal effects, or support inference over interventions, the effects of flattening inherently relational data can be particularly deleterious. In this paper, we identify four classes of potential issues that can occur with a propositionalization strategy as opposed to embracing a more expressive representation that would not succumb to these problems. We also present empirical results comparing the effectiveness of two theoretically sound and complete algorithms that learn causal structure: PC—a widely used constraint-based, propositional algorithm for causal discovery [11], and RCD—a recently developed constraint-based algorithm that reasons over a relational representation [6].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Beyond Understanding and Prediction: Data Mining for Action

Association analysis and prediction are two major tasks in data mining, and they represent two foremost objectives: data exploration for understanding and model construction for prediction. Data mining is known as a process to convert raw data to useful information --knowledge. However, what do we do with the knowledge discovered from data? We will need knowledge to enable actions, such as prev...

متن کامل

Deep Convolutional Neural Networks for Pairwise Causality

Discovering causal models from observational and interventional data is an important first step preceding what-if analysis or counterfactual reasoning. As has been shown before[1], the direction of pairwise causal relations can, under certain conditions, be inferred from observational data via standard gradient-boosted classifiers (GBC) using carefully engineered statistical features. In this p...

متن کامل

Latent Variable Discovery Using Dependency Patterns

The causal discovery of Bayesian networks is an active and important research area, and it is based upon searching the space of causal models for those which can best explain a pattern of probabilistic dependencies shown in the data. However, some of those dependencies are generated by causal structures involving variables which have not been measured, i.e., latent variables. Some such patterns...

متن کامل

What You Can Learn from Wrong Causal Models

It is common for social science researchers to provide estimates of causal effects from regression models imposed on observational data. The many problems with such work are well documented and widely known. The usual response is to claim, with little real evidence, that the causal model is close enough to the “truth” that sufficiently accurate causal effects can be estimated. In this chapter, ...

متن کامل

The end of the net as we know it? Deep packet inspection and internet governance

R ecent congressional hearings on the heretofore little-known concept of network neutrality prove once again that the devil is in the details. It is gratifying that the United States, which has long drifted without a policy rudder on broadband networking and has fallen behind numerous countries in broadband uptake, is now grappling with how to convert and augment our legacy communications syste...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013